Revised Loss Bounds for the Set Covering Machine and Sample-Compression Loss Bounds for Imbalanced Data
نویسندگان
چکیده
Marchand and Shawe-Taylor (2002) have proposed a loss bound for the set covering machine that has the property to depend on the observed fraction of positive examples and on what the classifier achieves on the positive training examples. We show that this loss bound is incorrect. We then propose a loss bound, valid for any sample-compression learning algorithm (including the set covering machine), that depends on the observed fraction of positive examples and on what the classifier achieves on them. We also compare numerically the loss bound proposed in this paper with the incorrect bound, the original SCM bound and a recently proposed loss bound of Marchand and Sokolova (2005) (which does not depend on the observed fraction of positive examples) and show that the latter loss bounds can be substantially larger than the new bound in the presence of imbalanced misclassifications.
منابع مشابه
Sparsity in machine learning: theory and practice
The thesis explores sparse machine learning algorithms for supervised (classification and re gression) and unsupervised (subspace methods) learning. For classification, we review the set covering machine (SCM) and propose new algorithms th a t directly minimise the SCMs sample compression generalisation error bounds during the training phase. Two of the resulting algo rithm s are proved to pr...
متن کاملBounds on the outer-independent double Italian domination number
An outer-independent double Italian dominating function (OIDIDF)on a graph $G$ with vertex set $V(G)$ is a function$f:V(G)longrightarrow {0,1,2,3}$ such that if $f(v)in{0,1}$ for a vertex $vin V(G)$ then $sum_{uin N[v]}f(u)geq3$,and the set $ {uin V(G)|f(u)=0}$ is independent. The weight ofan OIDIDF $f$ is the value $w(f)=sum_{vin V(G)}f(v)$. Theminimum weight of an OIDIDF on a graph $G$ is cal...
متن کاملEstimating a Bounded Normal Mean Relative to Squared Error Loss Function
Let be a random sample from a normal distribution with unknown mean and known variance The usual estimator of the mean, i.e., sample mean is the maximum likelihood estimator which under squared error loss function is minimax and admissible estimator. In many practical situations, is known in advance to lie in an interval, say for some In this case, the maximum likelihood estimator...
متن کاملEfficiency Evaluation and Ranking DMUs in the Presence of Interval Data with Stochastic Bounds
On account of the existence of uncertainty, DEA occasionally faces the situation of imprecise data, especially when a set of DMUs include missing data, ordinal data, interval data, stochastic data, or fuzzy data. Therefore, how to evaluate the efficiency of a set of DMUs in interval environments is a problem worth studying. In this paper, we discussed the new method for evaluation and ranking i...
متن کاملBayesian Two-Sample Prediction with Progressively Type-II Censored Data for Some Lifetime Models
Prediction on the basis of censored data is very important topic in many fields including medical and engineering sciences. In this paper, based on progressive Type-II right censoring scheme, we will discuss Bayesian two-sample prediction. A general form for lifetime model including some well known and useful models such asWeibull and Pareto is considered for obtaining prediction bounds ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007